An adaptive algorithm for clustering cumulative probability distribution functions using the Kolmogorov-Smirnov two-sample test

نویسندگان

  • Llanos Mora López
  • Juan Mora
چکیده

This paper proposes an adaptive algorithm for clustering cumulative probability distribution functions (c.p.d.f.) of a continuous random variable, observed in different populations, into the minimum homogeneous clusters, making no parametric assumptions about the c.p.d.f.’s. The distance function for clustering c.p.d.f.’s that is proposed is based on the KolmogorovSmirnov two sample statistic. This test is able to detect differences in position, dispersion or shape of the c.p.d.f.’s. In our context, this statistic allows us to cluster the recorded data with a homogeneity criterion based on the whole distribution of each data set, and to decide whether it is necessary to add more clusters or not. In this sense, the proposed algorithm is adaptive as it automatically increases the number of clusters only as necessary; therefore, there is no need to fix in advance the number of clusters. The output of the algorithm are the common c.p.d.f. of all observed data in the cluster (the centroid) and, for each cluster, the Kolmogorov-Smirnov statistic between the centroid and the most distant c.p.d.f. The proposed algorithm has been used for a large data set of solar global irradiation spectra distributions. The results obtained enable to reduce all the information of more than 270000 c.p.d.f.’s in only 6 different clusters that correspond to 6 different c.p.d.f.’s.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Fitting Tree Height Distributions in Natural Beech Forest Stands of Guilan (Case Study: Masal)

        In this research, modeling tree height distributions of beech in natural forests of Masal that is located in Guilan province; was investigated. Inventory was carried out using systematic random sampling with network dimensions of 150×200 m and area sample plot of 0.1 ha. DBH and heights of 630 beech trees in 30 sample plots were measured. Beta, Gamma, Normal, Log-normal and Weibull prob...

متن کامل

A fast algorithm for two-dimensional Kolmogorov-Smirnov two sample tests

By using the brute force algorithm, the application of the two-dimensional two-sample Kolmogorov–Smirnov test can be prohibitively computationally expensive. Thus a fast algorithm for computing the two-sample Kolmogorov–Smirnov test statistic is proposed to alleviate this problem. The newly proposed algorithm is O(n) times more efficient than the brute force algorithm, where n is the sum of the...

متن کامل

Fuzzy Empirical Distribution Function: Properties and Application

The concepts of cumulative distribution function and empirical distribution function are investigated for fuzzy random variables. Some limit theorems related to such functions are established. As an application of the obtained results, a method of handling fuzziness upon the usual method of Kolmogorov–Smirnov one-sample test is proposed. We transact the α-level set of imprecise observations in ...

متن کامل

Modified Kolmogorov-Smirnov Test of Goodness of Fit

A modified version of the Kolmogorov-Smirnov (KS) test is presented as a tool to assess whether a specified, although arbitrary, probability model is unsuitable to describe the underlying distribution of a set of observations. The KS test computes distances between points of the sample cumulative distribution function and the hypothetical one as absolute differences between them, and then consi...

متن کامل

A Kolmogorov-Smirnov test for the molecular clock based on Bayesian ensembles of phylogenies

Divergence date estimates are central to understand evolutionary processes and depend, in the case of molecular phylogenies, on tests of molecular clocks. Here we propose two non-parametric tests of strict and relaxed molecular clocks built upon a framework that uses the empirical cumulative distribution (ECD) of branch lengths obtained from an ensemble of Bayesian trees and well known non-para...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 42  شماره 

صفحات  -

تاریخ انتشار 2015